Noname manuscript No. 

(will be inserted by the editor) 



Eliciting implicit assumptions of proofs in the Mizar 
Mathematical Library by property omission 

Jesse Alama^ 



the date of receipt and acceptance should be inserted later 



Abstract When formalizing proofs with interactive theorem provers, it often 
happens that extra background knowledge (declarative or procedural) about 
mathematical concepts is employed without the formalizer explicitly invoking 
it, to help the formalizer focus on the relevant details of the proof. In the 
contexts of producing and studying a formalized mathematical argument, such 
mechanisms are clearly valuable. But we may not always wish to suppress 
background knowledge. For certain purposes, it is important to know, as far 
as possible, precisely what background knowledge was implicitly employed 
in a formal proof. In this note we describe an experiment conducted on the 
Mizar Mathematical Library of formal mathematical proofs to elicit one such 
class of implicitly employed background knowledge: properties of functions 
and relations (e.g., commutativity, asymmetry, etc.). 



1 Introduction 

When formalizing mathematical proofs with interactive theorem provers, it 
often happens that extra background knowledge is employed without the for- 
malizer explicitly invoking it. The effect is clear: thanks to such facilities, the 
formalizer can focus more on the relevant details of the proof he is working 
on, rather than (relatively more) tedious "details" . In the contexts of produc- 
ing and studying a formalized mathematical argument, such mechanisms are 
important and deserve to be strengthened. 

But we may not always wish to suppress background knowledge. For certain 
purposes, it is important to know, as far as possible, precisely what background 
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knowledge was implicitly employed in a formal proof. There are a number of 
practical applications of such information, such as facilitating theory explo- 
ration [?] , library recompilation [?], improved machine learning about formal 
proofs, premise selection heuristics, etc. The aim of eliciting exactly what is 
used is also important philosophically, because we come closer to finding what 
is truly necessary for the success of a theorem. Our project thus aligns in a 
modest way with the overall aims of reverse mathematics [?] , a vibrant branch 
of contemporary proof theory whose aim is to discover axioms from theorems 
(rather than the other way around). 

The task of eliciting all implicit information of a formalized mathematical 
proof is a stimulating challenge. Various approaches can be used depending on 
the ITP and its associated logic(s). (For a comparison of how this can be done 
for Mizar as compared to Coq, see [?].) In this note we describe an experiment 
conducted on the Mizar Mathematical Library (MML) of formal mathematical 
proofs to elicit one such class of implicitly employed background knowledge: 
properties of functions and relations (e.g., commutativity, asymmetry, etc.). 
We employ the term constructor to mean function or relation in a formal 
mathematical library.^ The MML is a large collection of formal mathematical 
proofs expressed in classical first-order set theory and a natural deduction-style 
declarative proof formalism. (For an introduction to Mizar, see [?].) Analogous 
experiments could clearly be performed on other libraries of formalized math- 
ematical knowledge. The Mizar system is especially attractive for this kind of 
experiment because of the clear semantics of the properties that can be implic- 
itly attached to functions and relations (they are simple universal first-order 
formulas) and the relative ease of manipulating these properties. 

Section 2 outlines some previous related work and sketches some of the 
intended applications of the constructor property dependency data. Section 3 
gives two examples of how constructor properties can be; implicitly used in 
formal proofs in Mizar. Section 4 describes the method we used to make explicit 
the constructor properties that are needed for Mizar inferences. The heart of 
the paper is Section 5, where we give the results of our computation about 
needed and unneeded constructor properties for the entire Mizar Mathematical 
Library. 

2 Applications and previous work 

Mizar's notion of constructor properties (and another mechanism for suppress- 
ing the premises of an inference, so-called requirements) are a relatively new 
invention for Mizar [?]. The information that a constructor property is needed 
for an inference can be exploited in various ways. One potential application 
is to use the needed-property information as an indication of what formula 
shapes are useful in the search for a successful proof. Building on the ideas of 

^ The term "constructor" is in fact part of the Mizar idiolect for formal mathematics. 
There are more kinds of objects that count as "constructors" in Mizar — such as structures — 
but in this paper we are interested only in functions and predicates. 
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S. Shulz [?], one would strip away the actual symbols employed in a proof that 
implicitly relies on some constructor property, thereby learning some structural 
information about what contributes to the success of a theorem. One could 
profitably using this information to assist in the problem of selecting candi- 
date premises in large ATP problems [?]. More generally, such a structural 
approach can give us some information about shapes of successful patterns of 
formal reasoning. 

The information that a property of a constructor is not needed indicates, 
prima facie, that one is dealing with a kind of generalization. It may not 
always be clear, though, how to use this information to craft generalizations. 
The problem problem lies in the familiar distinction in logic between reasoning 
formally about a class of structures or about a single structure. It would seem 
that information that a property is not needed information is more useful in 
cases of reasoning about classes of structure (e.g., about fields), but in the 
case of reasoning about "concrete" mathematical objects (a single structure, 
e.g., the real numbers). Thus, we may find in a theorem about real numbers 
that commutativity of addition is not used. One could immediately construct 
a generalization of such a theorem to a class of structures that are just like 
the real numbers, but in which the commutativity of addition is not assumed. 
The value of such a generalization may not be clear. By contrast, imagine we 
encounter a theorem about fields in which the commutativity of the field's 
addition operation is not used. In this case, the generalization procedure is 
similar to the procedure used in the case of the reals. By contrast, though, the 
significance and utility of such a generalization is clearer because the reasoning 
was already about a wide class of structures, the properties of whose functions 
and relations enjoyed some flexibility. 

One might imagine an advisor attached to an interactive theorem prover 
that can use the information that a constructor property is needed to help one 
investigate and formulate suitable generalizations. As discussed earlier, one 
relevant task (which seems quite interesting from the perspective of artificial 
intelligence) would be to decide whether a generalization is even warranted. 
It's intuitively clear that when we use a mathematical concept in a proof we 
use only some aspects of it and not others; at some steps, we require one thing 
of a concept, but at other steps we require something else. Dividing a concept 
so that we never require from it nothing less than its "full content" every time 
it is used seems like a drastic suggestion. Still, we may find value in a mitigated 
form of this advice. 



3 Omitting properties of functions and relations 

In Mizar, one can attach properties to functions and relations; Tables 1 and 2 
lays out all nine properties currently supported by Mizar. These properties are 
used by the Mizar verifier as "background knowledge" that doesn't need to be 
cited by a formalizer. 
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rtelation Property 


Formula 






reflexivity 


Vx[R{x,x)] 






symmetry 


yxyy[R{x,y)^R{y,x)] 






asymmetry 


Vxyy[R{x,y)^^R{y,x)] 






connectedness 


yxyy[R{x,y)VR{y,x)] 






irreflexivity 


\/x[-^R{x,x)] 






Table 1 Properties of relations in Mizar 








Function Property 


Formula 






projectivity 


Mfim) = m] 






involutivcncss 


Va;[/(/(xjj - x\ 






idempotence 


yx[g{x, x) = x] 






commutativity 


yxVy[g{x,y) = g{y,x)] 






Table 2 Properties of functions in Mizar 








3.1 Example 1: Relation property 








Consider the definition of tlie proper subset relation: 




1 


def inition 






2 


let X,Y be set; 






3 


pr ed 






4 


X c< Y 






5 


means 






6 


X c= Y & X <> Y; 






7 


irreflexivity; 






8 


asymmetry ; 







This example, taken from the Mizar article XBOOLE_0, defines the predicate 
(pred) of one set X being a proper subset c< of another Y. The symbols c= 
and <> denote the subset relation and disequality, respectively. The keywords 
irreflexivity and asymmetry included in the definition indicate that the 
proper subset relation will henceforth have the properties of irreflexivity and 
asymmetry; inferences involving the proper subset relation will implicitly use 
these properties. 

There are 58 occurrences in the Mizar Mathematical Library in which the 
irreflexivity of the proper subset relation is implicitly used. Here is one exam- 
ple, taken from the article TREES_1: 

1 theorem 

2 not <*n*> is_a_proper_pref ix_of <*m*> 

3 proof 

4 assume Al : not thesis; 




The theorem says that the one-term sequence whose sole term is the number n 
is not a proper prefix of the one-term sequence whose sole term is the number 
m. The binary relation is_a_pref ix_of on finite sequences is defined simply 
as set inclusion. The reference Thl6 refers to the theorem that says that a one- 
term finite sequence (s) is a prefix of the one-term finite sequence (t) iff s = t. 
The proof of our theorem goes by contradiction. Note that the symbol for the 
proper subset relation, c<, occurs neither in the theorem nor in the proof. In 
the absence of the assumption that c< is irrcflexive, the contradiction at the 
end of the proof does not follow. A contradiction indeed does not follow: we 
get from Thl6 that n = m, but this is compatible with the one-term sequence 
<*n*> being a proper subset of <*in*>, if we haven't assumed that the proper 
subset relation is irreflexive. 



3.2 Example 2: Function property 

Additive magmas come equipped, of course, with an addition operation -|-. In 
the case of abclian additive magmas, we know that -I- is commutative: 

def inition 
let V be Abelian addMagma , 

V be Element of V, 

w be Element of V; 
redefine func v + w; 
commutativity ; 




This example is taken from the Mizar article RLVECT_1. The keyword redefine 
here does not indicate that we are changing the definiens of the binary function 
-|- on elements of additive magmas (which in any event is essentially undefined); 
rather, we arc attaching the property of commutativity to +. Such an opera- 
tion is obviously admissible because of the definition of what it means for an 
additive magma to be abelian. 

As an example of an inference that implicitly depends on the commutativity 
of +, consider: 

theorem 

for L being add-associative right_zeroed 
r ight_complementable Abelian 
non empty addLoopStr , 
b, c being Element of L 
holds c = b - (b - c) 



non empty addLoopStr , 
b, c be Element of L; 
set a = b - c; 

a+c-a = c-a+a by RLVECT_1:28 
.= c by Thl; 




Ignoring the definition of all the; attributes that are being attached to the type 
addLoopStr (viz., an additive Moufang loop structure), the crucial step here 
is the equation on line 17. Note that the terms a and c are being swapped 
without reference to the commutativity of + (the reference to the theorem 
RLVECT_1:28 is not relevant here). 



4 Eliciting needed implicit constructor properties 

To elicit the constructor properties that are needed for an item of the Mizar 
Mathematical Library, we exploit Mizar's separation of (i) the construction of 
the environment in which verification will be carried out from (ii) the process 
of verification properly speaking. 

In step (i), Mizar constructs an environment for verification, importing all 
constructors that occur explicitly or implicitly in a Mizar text. If a constructor 
has a property associated with it, the environment will contain the property 
attached to the constructor, regardless of whether it is truly needed. The 
environment is thus a conservative overestimate of what is truly needed for 
the verification to succeed. By intervening between the constriiction of the 
environment and the verification proper, one can manipulate the environment 
in which the verification is carried out. We simply remove a property attached 
to a constructor and carry out the verification: if the verification succeeds, we 
know that the property of the construction was not actually needed. 

Thanks to the use of XML as the representation of the environment for 
Mizar articles [?], conducting our experiment is as simple as applying certain 
XSL stylesheets to the environment files. 

Rather than operating on whole Mizar articles, which generally contain 
dozens if not hundreds of toplevel items, we operate on a individual theorems of 
the MML. This is made possible by dividing the MML into fine-grained "items" 
(which are in fact valid Mizar "microarticle"); see [?] for more information on 
how this decomposition of the MML is carried out. 

5 Usage of properties throughout the MML 

We have so far said that a verifiable item / of the Mizar Mathematical Library 
depends on property P of constructor C just in case /, in the absence of 
the attachment of P to C, is not verifiable. This definition of dependence 
upon a constructor property fails to capture the dependence of one item upon 
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Property 



Direct Items Indirect Items 



reflexivity 
symmetry 

asymmetry 



54113 102426 

29744 97220 

256 82585 

5020 83083 

91 65951 

153 10002 

533 67853 

535 70132 

14055 92580 



connectedness 
irreflexivity 

projectivity 



involutiveness 
idempotence 



commutativity 



Table 3 Direct and indirect dependence upon properties in the MML 

another. For example, suppose that an item / does not depend on property P 
of constructor C, that is, / is verifiable if one detaches from C the property 
P. Suppose, though, that item / depends on some other item I' which does 
need property P of C. Such a though experiment suggest that we distinguish 
two sense of "need": direct and indirect. 

Definition 1 Item / directly needs property P of constructor C iff verifi- 
cation of I will fail if P is detached from C. 

Item I indirectly needs property P of C iff J directly needs P of C or 
there exists an item /' such that / depends on /' and /' needs P of C. 

Both senses of "need" are useful. Table 3 gives statistics about various 
constructor properties that are needed directly and indirectly in the MML. 
There are a number of fascinating details behind these statistics: 

— Reflexivity is directly needed by nearly half of the items of the Mizar Math- 
ematical Library, and is indirectly needed by nearly the entire library. Re- 
flexivity of equality of sets accounts for this: it is indirectly needed by fully 
102242 items. It is perhaps not surprising that such a fundamental property 
of a built-in logical notion pervades the library. 

Putting aside equality, the next most important reflexive constructor is 
subset inclusion, which is indirectly needed by 93284 items. A redefinition 
of subset inclusion for ordinals is indirectly used by 8279 items. Putting 
aside these "logical" or "set theoretical" examples, the most important 
reflexive "mathematical" relation is the less-than-or-equal-to relation < on 
(extended) real numbers, whose reflexivity is indirectly needed by 67196 
items. 

— Irreflexivity is directly needed by only a handful of items in the library, 
but indirectly it supports about 2/3 of the library. The explanation is the 
proper subset relation: the irreflexivity of this constructor is needed by 
65546 items. The most important "mathematical" example is the relation 
of one element of a relation being strictly less than another. 

— Asymmetry is attached to only five constructors in the entire MML: 

— G (set membership); 
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— the proper subset relation; 

— a variant of G, defined for many-sorted set structures; 

— the strictly-lexicographically-less-than relation on finite tuples of natu- 
ral numbers; 

— the strictly-lexicographically-less-than relation on bags of ordinals. 

— The asymmetry, of G, is foundationally significant in the sense of mathe- 
matical logic (it expresses a weak form of the axiom of foundation) and for 
the Mizar Mathematical Library. The asymmetry of this constructor alone 
accounts for essentially all items that need the asymmetry of any construc- 
tor: 82581 items indirectly need this weak form of foundation, whereas only 
283 items directly depend on this property of G.^ 

— Likewise, projectivity is rarely directly needed, but supports a substantial 
piece of the library. Interestingly, it is the closure operation defined on 
subsets of a topological space^ that accounts for the lion's share of the 
items that indirectly need an projective constructor: 7536 items depend on 
the projectivity of the closure operation. 

— Connectedness is attached to very few constructors of the Mizar Mathemat- 
ical Library, but the constructors to which this property is attached have a 
significant influence across the MML. The constructor whose connectedness 
is used indirectly by the greatest number of items is the subset relation, 
restricted to ordinals. The connectedness of this constructor expresses a 
rather significant fact about ordinals (any two ordinals are comparable). 
The proof of this fact in Mizar uses a trichotomy-like principle for ordinals, 
saying that for any two ordinals A and B, either AgB,A = B, or Bg A. 
The connectedness of the subset relation on ordinals is indirectly needed by 
82490 items. The next most significant example is < on rational numbers, 
indirectly needed by 71313 items. 

— Involutivcness also requires some explanation. There are two items that vie 
for the most important here: 

— The sign-changing operation x i-> —x on real numbers is needed by 
65501 items. 

— The reciprocal operation z i—)- 1/z on complex numbers is needed by 
65105 items. 

The constructor with the next highest mimbcr of items that indirectly 
depend on its involutiveness is the relative complement operation, which 
is indirectly needed by 8847 items. 

— The constructor whose idempotence is most frequently needed indirectly is 
the binary union of two sets, which is indirectly needed by 69184 items. The 
idempotence of binary set intersection takes second place: it is indirectly 
needed by 24249 items. 



^ This is arguably a curiosity of the organization of the MML. It turns out that asymmetry 
of 6 is used very early on in the logical construction of the MML, hence its outsized influence. 

^ To be precise, the operation is defined not on topological spaces, but on topological 
structures, which one can think of simply as a class of structures that has a carrier and a 
topology, viz., a collection of subsets of the carrier. A topological structure on its own is not 
assumed to have any properties beyond these. 
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Several items in the Mizar Mathematical Library indirectly depend on more 
than 100 constructor properties. The item with the greatest number of 
dependencies is taken from the proof of the Jordan curve theorem: 



theorem 
for n being Element of NAT 
for C being being_simple_closed_curve 
Subset of (TOP-REAL 2) 
St n is_suf f iciently_large_f or C 
holds cell ( Gauge (C , n) , 

X-SpanStart (C ,n) -'1, 
Y-SpanStart (C ,n) ) 

misses C 

proof 

assume n i s_ suf f i c lent ly_large_f or 
then cell ( Gauge (C , n) , 

X-SpanStart(C,n) - '1, 

Y-SpanStart (C , n) ) c= BDD C by Th6 ; 
hence thesis by JORDANIA : 15 , XB00LE_1 : 63 ; 
end ; 




The precise mathematical definitions, the theorem references (Th6, JORDANIA : 15, 
etc.) and the proof here are not important. What is important is that we 
are dealing here with a very short proof (it has only three steps) of a the- 
orem along the way to a substantial landmark in formal mathematics. De- 
spite appearances, this theorem indirectly depends on fully 113 constructor 
properties. 

The formalization in Mizar of the Jordan curve theorem required a great 
deal of work and made heavy use of the Mizar system's features, such as 
its support for constructor properties under discussion here. We see this by 
virtue of the fact that the theorems of the series of Mizar articles leading 
to the final proof of the Jordan curve theorem indirectly need, on average, 
several dozen constructor properties. 



6 Conclusion and future work 

The abstractions discovered here in the context of an interactive theorem 
prover could, in all probability, be discovered equally well by an ATP. With 
an automated theorem prover one could, in principle, go much further than 
we have gone here. An infrastructure for carrying out such exploration (sound 
translation of Mizar proofs to a vanilla unsorted first-order language, infras- 
tructure for constructing and working with the associated ATP problems, etc.) 
already exists [?] . One could even verify the dependency claims made here out- 
side of Mizar, in the style of [?]. 

When we discover that some property of a function or relation is needed, we 
are discovering not that the property is logically or mathematically needed for 
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the success of the theorem in question; instead, wc arc discovering only that 
there exists a formal deduction (viz., a Mizar proof) where this property is 
used. It is certainly quite possible that there would exist other formal proofs 
of the same theorem that do not use the property. This suggests a much 
more ambitious project: discovering new proofs of theorems that use much 
less than what an interactively constructed proof uses. Such prospects are 
enticing, and deserve to be carried out. Such a project naturally also bears on 
the philosophical problem of when two proofs are "the same": it could very 
well be that there are two proofs of the same theorem, one which exploits 
a property of a hmction or relation, and another which doesn't, but which 
should nonetheless, from another perspective, be considered identical. 

Constructor properties are but one mechanism in Mizar that hides premises 
of inferences. Mizar also supports so-called requirements, which also help to 
allow one to reason validly without having to be explicit about precisely what 
premises are needed for every inference of a proof. Mizar also has some built-in 
functionality concerning arithmetic. Both of these mechanisms are of great 
value for the formalizer when constructing a formal proof, but if one is in- 
terested in making explicit premises that were suppressed during proof con- 
struction, Mizar's requirements and arithmetic facilities need to be taken into 
account. 

One might wonder why there are only the nine constructor properties sup- 
ported by Mizar. Binary relation transitivity, unary function surjcctivity and 
injectivity, for example, are conspicuously absent. Supporting such additional 
function and relation properties could be quite valuable; Mizar itself and tools 
using its library could exploit function injectivity and surjcctivity, for exam- 
ple, to help rule out the solution for certain search problems that require the 
domain of discourse to be finite (see [?]). But more generally, it would be valu- 
able to mine the Mizar Mathematical Library for common shapes of formulas 
that play a large inferential role, and which could naturally be promoted to 
the level of constructor properties. One might discover fruitful properties that 
could help make formalization in Mizar even more appealing and practical. 



